Goto

Collaborating Authors

 pareto optimal set




during learning, numerical precision reduction and for finding the Pareto optimal set of configurations apply directly

Neural Information Processing Systems

We would like to thank the reviewers for their thoughtful comments and valuable suggestions. We will clarify this point in the paper. Our algorithms are agnostic to the leaf distributions used. Thanks for this valuable feedback, we will improve the pseudocode as you suggest. As such, there is memory overhead but no computational overhead.


Preference-based Pure Exploration

Shukla, Apurv, Basu, Debabrota

arXiv.org Machine Learning

We study the preference-based pure exploration problem for bandits with vector-valued rewards. The rewards are ordered using a (given) preference cone $\mathcal{C}$ and our the goal is to identify the set of Pareto optimal arms. First, to quantify the impact of preferences, we derive a novel lower bound on the sample complexity for identifying the most preferred policy with confidence level $1-\delta$. Our lower bound elicits the role played by the geometry of the preference cone and punctuates the difference in hardness compared to existing best-arm identification variants of the problem. We further explicate this geometry when rewards follow Gaussian distributions. We then provide a convex relaxation of the lower bound. and leverage it to design Preference-based Track and Stop (PreTS) algorithm that identifies the most preferred policy. Finally, we show that sample complexity of PreTS is asymptotically tight by deriving a new concentration inequality for vector-valued rewards.